Overview

Dataset statistics

Number of variables15
Number of observations660
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory77.5 KiB
Average record size in memory120.2 B

Variable types

Categorical5
Numeric10

Alerts

Club has a high cardinality: 180 distinct values High cardinality
Player Names has a high cardinality: 444 distinct values High cardinality
Matches_Played is highly correlated with Mins and 5 other fieldsHigh correlation
Mins is highly correlated with Matches_Played and 5 other fieldsHigh correlation
Goals is highly correlated with Matches_Played and 5 other fieldsHigh correlation
xG is highly correlated with Matches_Played and 4 other fieldsHigh correlation
xG Per Avg Match is highly correlated with Shots Per Avg Match and 1 other fieldsHigh correlation
Shots is highly correlated with Matches_Played and 5 other fieldsHigh correlation
OnTarget is highly correlated with Matches_Played and 5 other fieldsHigh correlation
Shots Per Avg Match is highly correlated with xG Per Avg Match and 1 other fieldsHigh correlation
On Target Per Avg Match is highly correlated with xG Per Avg Match and 1 other fieldsHigh correlation
Year is highly correlated with Matches_Played and 4 other fieldsHigh correlation
Matches_Played is highly correlated with Mins and 5 other fieldsHigh correlation
Mins is highly correlated with Matches_Played and 5 other fieldsHigh correlation
Goals is highly correlated with Matches_Played and 4 other fieldsHigh correlation
xG is highly correlated with Matches_Played and 4 other fieldsHigh correlation
xG Per Avg Match is highly correlated with Shots Per Avg Match and 1 other fieldsHigh correlation
Shots is highly correlated with Matches_Played and 5 other fieldsHigh correlation
OnTarget is highly correlated with Matches_Played and 6 other fieldsHigh correlation
Shots Per Avg Match is highly correlated with xG Per Avg Match and 2 other fieldsHigh correlation
On Target Per Avg Match is highly correlated with xG Per Avg Match and 2 other fieldsHigh correlation
Year is highly correlated with Matches_Played and 2 other fieldsHigh correlation
Matches_Played is highly correlated with Mins and 4 other fieldsHigh correlation
Mins is highly correlated with Matches_Played and 4 other fieldsHigh correlation
Goals is highly correlated with Matches_Played and 4 other fieldsHigh correlation
xG is highly correlated with Matches_Played and 4 other fieldsHigh correlation
Shots is highly correlated with Matches_Played and 4 other fieldsHigh correlation
OnTarget is highly correlated with Matches_Played and 4 other fieldsHigh correlation
Shots Per Avg Match is highly correlated with On Target Per Avg MatchHigh correlation
On Target Per Avg Match is highly correlated with Shots Per Avg MatchHigh correlation
League is highly correlated with CountryHigh correlation
Country is highly correlated with LeagueHigh correlation
Country is highly correlated with LeagueHigh correlation
League is highly correlated with CountryHigh correlation
Matches_Played is highly correlated with Mins and 5 other fieldsHigh correlation
Mins is highly correlated with Matches_Played and 5 other fieldsHigh correlation
Goals is highly correlated with Matches_Played and 8 other fieldsHigh correlation
xG is highly correlated with Matches_Played and 8 other fieldsHigh correlation
xG Per Avg Match is highly correlated with Goals and 4 other fieldsHigh correlation
Shots is highly correlated with Matches_Played and 8 other fieldsHigh correlation
OnTarget is highly correlated with Matches_Played and 7 other fieldsHigh correlation
Shots Per Avg Match is highly correlated with Goals and 5 other fieldsHigh correlation
On Target Per Avg Match is highly correlated with Goals and 5 other fieldsHigh correlation
Year is highly correlated with Matches_Played and 5 other fieldsHigh correlation
Player Names is uniformly distributed Uniform
Substitution has 169 (25.6%) zeros Zeros

Reproduction

Analysis started2021-11-26 17:43:14.882867
Analysis finished2021-11-26 17:43:40.299916
Duration25.42 seconds
Software versionpandas-profiling v3.1.0
Download configurationconfig.json

Variables

Country
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct9
Distinct (%)1.4%
Missing0
Missing (%)0.0%
Memory size5.3 KiB
Spain
100 
Italy
100 
Germany
100 
Brazil
100 
England
80 
Other values (4)
180 

Length

Max length12
Median length6
Mean length6.333333333
Min length3

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowSpain
2nd rowSpain
3rd rowSpain
4th rowSpain
5th rowSpain

Common Values

ValueCountFrequency (%)
Spain100
15.2%
Italy100
15.2%
Germany100
15.2%
Brazil100
15.2%
England80
12.1%
France60
9.1%
USA40
 
6.1%
Portugal 40
 
6.1%
Netherlands40
 
6.1%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
spain100
15.2%
italy100
15.2%
germany100
15.2%
brazil100
15.2%
england80
12.1%
france60
9.1%
usa40
 
6.1%
portugal40
 
6.1%
netherlands40
 
6.1%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

League
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct28
Distinct (%)4.2%
Missing0
Missing (%)0.0%
Memory size5.3 KiB
La Liga
100 
Serie A
100 
Bundesliga
100 
Campeonato Brasileiro Série A
100 
Premier League
80 
Other values (23)
180 

Length

Max length30
Median length10
Mean length12.77727273
Min length3

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowLa Liga
2nd rowLa Liga
3rd rowLa Liga
4th rowLa Liga
5th rowLa Liga

Common Values

ValueCountFrequency (%)
La Liga100
15.2%
Serie A100
15.2%
Bundesliga100
15.2%
Campeonato Brasileiro Série A100
15.2%
Premier League80
12.1%
Primeira Liga40
 
6.1%
MLS40
 
6.1%
Eredivisie40
 
6.1%
France Ligue 123
 
0.5%
France Ligue 93
 
0.5%
Other values (18)54
8.2%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
a200
14.3%
liga140
10.0%
la100
 
7.1%
serie100
 
7.1%
bundesliga100
 
7.1%
campeonato100
 
7.1%
brasileiro100
 
7.1%
sã©rie100
 
7.1%
premier80
 
5.7%
league80
 
5.7%
Other values (25)300
21.4%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Club
Categorical

HIGH CARDINALITY

Distinct180
Distinct (%)27.3%
Missing0
Missing (%)0.0%
Memory size5.3 KiB
None
 
34
(PSG)
 
14
(BAR)
 
13
(NAP)
 
13
(RMA)
 
11
Other values (175)
575 

Length

Max length29
Median length5
Mean length4.998484848
Min length3

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique64 ?
Unique (%)9.7%

Sample

1st row(BET)
2nd row(BAR)
3rd row(ATL)
4th row(CAR)
5th row(VAL)

Common Values

ValueCountFrequency (%)
None34
 
5.2%
(PSG)14
 
2.1%
(BAR)13
 
2.0%
(NAP)13
 
2.0%
(RMA)11
 
1.7%
(SOC)11
 
1.7%
(ATA)11
 
1.7%
(FLA)11
 
1.7%
(TOT)11
 
1.7%
(INT)10
 
1.5%
Other values (170)521
78.9%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
none34
 
5.1%
psg14
 
2.1%
bar13
 
2.0%
nap13
 
2.0%
rma11
 
1.7%
soc11
 
1.7%
ata11
 
1.7%
fla11
 
1.7%
tot11
 
1.7%
liv10
 
1.5%
Other values (174)526
79.1%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Player Names
Categorical

HIGH CARDINALITY
UNIFORM

Distinct444
Distinct (%)67.3%
Missing0
Missing (%)0.0%
Memory size5.3 KiB
Andrea Belotti
 
5
Lionel Messi
 
5
Luis Suarez
 
5
Andrej Kramaric
 
5
Ciro Immobile
 
5
Other values (439)
635 

Length

Max length25
Median length13
Mean length12.76212121
Min length2

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique317 ?
Unique (%)48.0%

Sample

1st rowJuanmi Callejon
2nd rowAntoine Griezmann
3rd rowLuis Suarez
4th rowRuben Castro
5th rowKevin Gameiro

Common Values

ValueCountFrequency (%)
Andrea Belotti5
 
0.8%
Lionel Messi5
 
0.8%
Luis Suarez5
 
0.8%
Andrej Kramaric5
 
0.8%
Ciro Immobile5
 
0.8%
Cristiano Ronaldo5
 
0.8%
Robert Lewandowski5
 
0.8%
Timo Werner5
 
0.8%
Iago Aspas5
 
0.8%
Fabio Quagliarella5
 
0.8%
Other values (434)610
92.4%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
diego9
 
0.7%
bruno9
 
0.7%
kevin8
 
0.6%
luis8
 
0.6%
carlos8
 
0.6%
raul7
 
0.6%
fabio7
 
0.6%
iago7
 
0.6%
andrea7
 
0.6%
de7
 
0.6%
Other values (698)1173
93.8%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Matches_Played
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct37
Distinct (%)5.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean22.37121212
Minimum2
Maximum38
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.3 KiB

Quantile statistics

Minimum2
5-th percentile6
Q114
median24
Q331
95-th percentile36
Maximum38
Range36
Interquartile range (IQR)17

Descriptive statistics

Standard deviation9.754657502
Coefficient of variation (CV)0.4360361633
Kurtosis-1.130325488
Mean22.37121212
Median Absolute Deviation (MAD)7
Skewness-0.3692394082
Sum14765
Variance95.15334299
MonotonicityNot monotonic
Histogram with fixed size bins (bins=37)
ValueCountFrequency (%)
3132
 
4.8%
932
 
4.8%
3230
 
4.5%
2930
 
4.5%
2330
 
4.5%
3329
 
4.4%
2628
 
4.2%
1026
 
3.9%
2426
 
3.9%
2725
 
3.8%
Other values (27)372
56.4%
ValueCountFrequency (%)
21
 
0.2%
35
 
0.8%
43
 
0.5%
56
 
0.9%
624
3.6%
722
3.3%
824
3.6%
932
4.8%
1026
3.9%
1112
 
1.8%
ValueCountFrequency (%)
384
 
0.6%
3713
2.0%
3617
2.6%
3524
3.6%
3421
3.2%
3329
4.4%
3230
4.5%
3132
4.8%
3024
3.6%
2930
4.5%

Substitution
Real number (ℝ≥0)

ZEROS

Distinct21
Distinct (%)3.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.224242424
Minimum0
Maximum26
Zeros169
Zeros (%)25.6%
Negative0
Negative (%)0.0%
Memory size5.3 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median2
Q35
95-th percentile11
Maximum26
Range26
Interquartile range (IQR)5

Descriptive statistics

Standard deviation3.839498442
Coefficient of variation (CV)1.190821885
Kurtosis4.896312268
Mean3.224242424
Median Absolute Deviation (MAD)2
Skewness1.922155203
Sum2128
Variance14.74174829
MonotonicityNot monotonic
Histogram with fixed size bins (bins=21)
ValueCountFrequency (%)
0169
25.6%
1130
19.7%
280
12.1%
365
 
9.8%
543
 
6.5%
435
 
5.3%
634
 
5.2%
721
 
3.2%
820
 
3.0%
1013
 
2.0%
Other values (11)50
 
7.6%
ValueCountFrequency (%)
0169
25.6%
1130
19.7%
280
12.1%
365
 
9.8%
435
 
5.3%
543
 
6.5%
634
 
5.2%
721
 
3.2%
820
 
3.0%
910
 
1.5%
ValueCountFrequency (%)
261
 
0.2%
232
 
0.3%
183
 
0.5%
171
 
0.2%
161
 
0.2%
155
0.8%
142
 
0.3%
135
0.8%
1210
1.5%
1110
1.5%

Mins
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct583
Distinct (%)88.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2071.416667
Minimum264
Maximum4177
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.3 KiB

Quantile statistics

Minimum264
5-th percentile574.95
Q11363.5
median2245.5
Q32822
95-th percentile3271.25
Maximum4177
Range3913
Interquartile range (IQR)1458.5

Descriptive statistics

Standard deviation900.595049
Coefficient of variation (CV)0.4347725223
Kurtosis-1.052411379
Mean2071.416667
Median Absolute Deviation (MAD)659
Skewness-0.3573606633
Sum1367135
Variance811071.4422
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
31153
 
0.5%
23123
 
0.5%
8723
 
0.5%
6693
 
0.5%
5673
 
0.5%
8223
 
0.5%
9033
 
0.5%
17862
 
0.3%
28222
 
0.3%
32362
 
0.3%
Other values (573)633
95.9%
ValueCountFrequency (%)
2641
0.2%
2801
0.2%
2931
0.2%
3371
0.2%
3561
0.2%
3871
0.2%
3961
0.2%
3971
0.2%
4511
0.2%
4521
0.2%
ValueCountFrequency (%)
41771
0.2%
39311
0.2%
36511
0.2%
36411
0.2%
35551
0.2%
35332
0.3%
35111
0.2%
34911
0.2%
34741
0.2%
34481
0.2%

Goals
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct33
Distinct (%)5.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean11.78484848
Minimum2
Maximum37
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.3 KiB

Quantile statistics

Minimum2
5-th percentile4
Q18
median11
Q314
95-th percentile23
Maximum37
Range35
Interquartile range (IQR)6

Descriptive statistics

Standard deviation5.98245392
Coefficient of variation (CV)0.507639443
Kurtosis2.330457684
Mean11.78484848
Median Absolute Deviation (MAD)3
Skewness1.181334957
Sum7778
Variance35.78975491
MonotonicityNot monotonic
Histogram with fixed size bins (bins=33)
ValueCountFrequency (%)
1167
 
10.2%
1264
 
9.7%
1055
 
8.3%
950
 
7.6%
448
 
7.3%
1347
 
7.1%
1437
 
5.6%
836
 
5.5%
1530
 
4.5%
1629
 
4.4%
Other values (23)197
29.8%
ValueCountFrequency (%)
29
 
1.4%
314
 
2.1%
448
7.3%
528
4.2%
620
 
3.0%
725
 
3.8%
836
5.5%
950
7.6%
1055
8.3%
1167
10.2%
ValueCountFrequency (%)
371
 
0.2%
363
0.5%
341
 
0.2%
332
 
0.3%
313
0.5%
301
 
0.2%
295
0.8%
284
0.6%
263
0.5%
256
0.9%

xG
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct558
Distinct (%)84.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean10.08960606
Minimum0.71
Maximum32.54
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.3 KiB

Quantile statistics

Minimum0.71
5-th percentile2.2775
Q16.1
median9.285
Q313.2525
95-th percentile20.5105
Maximum32.54
Range31.83
Interquartile range (IQR)7.1525

Descriptive statistics

Standard deviation5.72484367
Coefficient of variation (CV)0.5674001181
Kurtosis1.265816144
Mean10.08960606
Median Absolute Deviation (MAD)3.45
Skewness0.9563577449
Sum6659.14
Variance32.77383505
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
11.624
 
0.6%
3.334
 
0.6%
6.193
 
0.5%
6.113
 
0.5%
11.123
 
0.5%
14.513
 
0.5%
8.563
 
0.5%
8.73
 
0.5%
9.393
 
0.5%
11.093
 
0.5%
Other values (548)628
95.2%
ValueCountFrequency (%)
0.711
0.2%
0.81
0.2%
0.961
0.2%
1.031
0.2%
1.051
0.2%
1.121
0.2%
1.131
0.2%
1.311
0.2%
1.391
0.2%
1.431
0.2%
ValueCountFrequency (%)
32.541
0.2%
31.171
0.2%
31.051
0.2%
30.61
0.2%
30.521
0.2%
29.271
0.2%
291
0.2%
28.941
0.2%
27.321
0.2%
26.651
0.2%

xG Per Avg Match
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct92
Distinct (%)13.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.4761666667
Minimum0.07
Maximum1.35
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.3 KiB

Quantile statistics

Minimum0.07
5-th percentile0.24
Q10.34
median0.435
Q30.57
95-th percentile0.8605
Maximum1.35
Range1.28
Interquartile range (IQR)0.23

Descriptive statistics

Standard deviation0.1928313189
Coefficient of variation (CV)0.404966018
Kurtosis1.912739847
Mean0.4761666667
Median Absolute Deviation (MAD)0.105
Skewness1.176635667
Sum314.27
Variance0.03718391755
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.3323
 
3.5%
0.3921
 
3.2%
0.3721
 
3.2%
0.420
 
3.0%
0.4120
 
3.0%
0.4319
 
2.9%
0.3418
 
2.7%
0.5718
 
2.7%
0.3116
 
2.4%
0.4516
 
2.4%
Other values (82)468
70.9%
ValueCountFrequency (%)
0.071
 
0.2%
0.091
 
0.2%
0.152
 
0.3%
0.165
0.8%
0.171
 
0.2%
0.184
0.6%
0.192
 
0.3%
0.21
 
0.2%
0.214
0.6%
0.224
0.6%
ValueCountFrequency (%)
1.351
 
0.2%
1.271
 
0.2%
1.192
0.3%
1.161
 
0.2%
1.121
 
0.2%
1.11
 
0.2%
1.081
 
0.2%
1.064
0.6%
1.051
 
0.2%
1.011
 
0.2%

Shots
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct144
Distinct (%)21.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean64.17727273
Minimum5
Maximum208
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.3 KiB

Quantile statistics

Minimum5
5-th percentile14
Q137.75
median62
Q386
95-th percentile124.05
Maximum208
Range203
Interquartile range (IQR)48.25

Descriptive statistics

Standard deviation34.94162179
Coefficient of variation (CV)0.5444547627
Kurtosis0.6264647759
Mean64.17727273
Median Absolute Deviation (MAD)24
Skewness0.6761502446
Sum42357
Variance1220.916933
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
6513
 
2.0%
2113
 
2.0%
1812
 
1.8%
5512
 
1.8%
5211
 
1.7%
8111
 
1.7%
8710
 
1.5%
5610
 
1.5%
809
 
1.4%
549
 
1.4%
Other values (134)550
83.3%
ValueCountFrequency (%)
52
 
0.3%
62
 
0.3%
72
 
0.3%
81
 
0.2%
92
 
0.3%
103
0.5%
113
0.5%
125
0.8%
137
1.1%
147
1.1%
ValueCountFrequency (%)
2081
0.2%
1971
0.2%
1791
0.2%
1782
0.3%
1771
0.2%
1701
0.2%
1671
0.2%
1621
0.2%
1592
0.3%
1521
0.2%

OnTarget
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct79
Distinct (%)12.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean28.36515152
Minimum2
Maximum102
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.3 KiB

Quantile statistics

Minimum2
5-th percentile7
Q117
median26
Q337
95-th percentile58
Maximum102
Range100
Interquartile range (IQR)20

Descriptive statistics

Standard deviation16.36314925
Coefficient of variation (CV)0.5768750869
Kurtosis2.36577599
Mean28.36515152
Median Absolute Deviation (MAD)10
Skewness1.156283867
Sum18721
Variance267.7526532
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2526
 
3.9%
2420
 
3.0%
2320
 
3.0%
3320
 
3.0%
1920
 
3.0%
3019
 
2.9%
819
 
2.9%
2818
 
2.7%
1717
 
2.6%
3717
 
2.6%
Other values (69)464
70.3%
ValueCountFrequency (%)
23
 
0.5%
34
 
0.6%
43
 
0.5%
512
1.8%
66
 
0.9%
717
2.6%
819
2.9%
914
2.1%
1010
1.5%
1115
2.3%
ValueCountFrequency (%)
1021
0.2%
991
0.2%
981
0.2%
951
0.2%
911
0.2%
871
0.2%
861
0.2%
811
0.2%
791
0.2%
782
0.3%

Shots Per Avg Match
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct280
Distinct (%)42.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.948015152
Minimum0.8
Maximum7.2
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.3 KiB

Quantile statistics

Minimum0.8
5-th percentile1.709
Q12.335
median2.845
Q33.3825
95-th percentile4.54
Maximum7.2
Range6.4
Interquartile range (IQR)1.0475

Descriptive statistics

Standard deviation0.9149064812
Coefficient of variation (CV)0.3103466007
Kurtosis1.985334359
Mean2.948015152
Median Absolute Deviation (MAD)0.53
Skewness0.9473020506
Sum1945.69
Variance0.8370538693
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2.478
 
1.2%
2.868
 
1.2%
2.757
 
1.1%
2.67
 
1.1%
3.057
 
1.1%
2.227
 
1.1%
2.656
 
0.9%
3.026
 
0.9%
2.496
 
0.9%
2.266
 
0.9%
Other values (270)592
89.7%
ValueCountFrequency (%)
0.81
0.2%
0.811
0.2%
0.821
0.2%
0.851
0.2%
0.991
0.2%
1.031
0.2%
1.161
0.2%
1.22
0.3%
1.241
0.2%
1.271
0.2%
ValueCountFrequency (%)
7.21
0.2%
7.121
0.2%
6.321
0.2%
6.221
0.2%
5.991
0.2%
5.971
0.2%
5.891
0.2%
5.842
0.3%
5.671
0.2%
5.591
0.2%

On Target Per Avg Match
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct184
Distinct (%)27.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.315651515
Minimum0.24
Maximum3.63
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.3 KiB

Quantile statistics

Minimum0.24
5-th percentile0.72
Q10.98
median1.25
Q31.54
95-th percentile2.241
Maximum3.63
Range3.39
Interquartile range (IQR)0.56

Descriptive statistics

Standard deviation0.4742392996
Coefficient of variation (CV)0.3604596613
Kurtosis2.239211943
Mean1.315651515
Median Absolute Deviation (MAD)0.28
Skewness1.180931921
Sum868.33
Variance0.2249029133
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1.2513
 
2.0%
0.9412
 
1.8%
1.2812
 
1.8%
1.2112
 
1.8%
0.9611
 
1.7%
1.310
 
1.5%
1.2910
 
1.5%
110
 
1.5%
1.119
 
1.4%
1.269
 
1.4%
Other values (174)552
83.6%
ValueCountFrequency (%)
0.241
 
0.2%
0.291
 
0.2%
0.41
 
0.2%
0.482
0.3%
0.51
 
0.2%
0.521
 
0.2%
0.541
 
0.2%
0.552
0.3%
0.564
0.6%
0.581
 
0.2%
ValueCountFrequency (%)
3.631
0.2%
3.112
0.3%
3.041
0.2%
2.941
0.2%
2.91
0.2%
2.892
0.3%
2.851
0.2%
2.841
0.2%
2.831
0.2%
2.771
0.2%

Year
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct5
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Memory size5.3 KiB
2019
200 
2020
160 
2018
120 
2016
100 
2017
80 

Length

Max length4
Median length4
Mean length4
Min length4

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2016
2nd row2016
3rd row2016
4th row2016
5th row2016

Common Values

ValueCountFrequency (%)
2019200
30.3%
2020160
24.2%
2018120
18.2%
2016100
15.2%
201780
 
12.1%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
2019200
30.3%
2020160
24.2%
2018120
18.2%
2016100
15.2%
201780
 
12.1%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Interactions

Correlations

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

CountryLeagueClubPlayer NamesMatches_PlayedSubstitutionMinsGoalsxGxG Per Avg MatchShotsOnTargetShots Per Avg MatchOn Target Per Avg MatchYear
0SpainLa Liga(BET)Juanmi Callejon19161849116.620.3448202.471.032016
1SpainLa Liga(BAR)Antoine Griezmann36031291611.860.3688412.671.242016
2SpainLa Liga(ATL)Luis Suarez34129402823.210.75120573.881.842016
3SpainLa Liga(CAR)Ruben Castro32328421314.060.47117423.911.402016
4SpainLa Liga(VAL)Kevin Gameiro211017451310.650.5850232.721.252016
5SpainLa Liga(JUV)Cristiano Ronaldo29026342524.680.89162605.842.162016
6SpainLa Liga(RMA)Karim Benzema23619671113.250.6469343.331.642016
7SpainLa Liga(PSG)Neymar30026941313.330.47105423.701.482016
8SpainLa Liga(CEL)Iago Aspas25723541913.880.5678373.151.492016
9SpainLa Liga(EIB)Sergi Enrich3172904118.250.2764262.090.852016

Last rows

CountryLeagueClubPlayer NamesMatches_PlayedSubstitutionMinsGoalsxGxG Per Avg MatchShotsOnTargetShots Per Avg MatchOn Target Per Avg MatchYear
650NetherlandsEredivisie(AJA)Klaas-Jan Huntelaar61293896.910.7032173.241.722020
651NetherlandsEredivisie(WIL)Vangelis Pavlidis25022651112.640.5370312.941.302020
652NetherlandsEredivisie(EMM)Michael de Leeuw260238398.280.3351232.030.922020
653NetherlandsEredivisie(PSV)Donyell Malen1401245118.910.6859324.502.442020
654NetherlandsEredivisie(RZA)Haris Vuckic2322194116.000.2638171.650.742020
655NetherlandsEredivisie(UTR)Gyrano Kerk2402155107.490.3350182.200.792020
656NetherlandsEredivisie(AJA)Quincy Promes1821573129.770.5956303.381.812020
657NetherlandsEredivisie(PSV)Denzel Dumfries250236375.720.2345141.810.562020
658NetherlandsEredivisieNoneCyriel Dessers26024611514.510.5684433.241.662020
659NetherlandsEredivisie(PSV)Cody Gakpo1411155774.430.2738152.320.922020